Tutorial (Part 1): Visualizing Attacks on a Honey Pot

In this first part, we:

  1. Import Graphistry and load a CSV of log entries
  2. Visualize it as a graph by treating each row as an edge
  3. Color edges using a categorical palette based on the kind of alert
  4. Create a nodes table to control node sizes and colors

You can download this notebook to run it locally.


In [1]:
import pandas
import graphistry

# You can also set your API key once for all in the enviroment variable "GRAPHISTRY_API_KEY".
#graphistry.register(key='<go to www.graphistry.com/api-request to get a key>', server='labs.graphistry.com')

Load data with Pandas


In [3]:
logs = pandas.read_csv('.././../data/honeypot.csv')
logs[:3] # Show the first three rows of the loaded dataframe


Out[3]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
0 1.235.32.141 172.31.14.66 139.0 MS08067 (NetAPI) 6 1.421434e+09 1.421423e+09
1 105.157.235.22 172.31.14.66 445.0 MS08067 (NetAPI) 4 1.422498e+09 1.422495e+09
2 105.186.127.152 172.31.14.66 445.0 MS04011 (LSASS) 1 1.419966e+09 1.419966e+09

Dates in time(max) and time(min) are unix timestamps. Pandas helps parse them.


In [3]:
logs['time(max)'] = pandas.to_datetime(logs['time(max)'], unit='s')
logs['time(min)'] = pandas.to_datetime(logs['time(min)'], unit='s')
logs[:3]


Out[3]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
0 1.235.32.141 172.31.14.66 139 MS08067 (NetAPI) 6 2015-01-16 18:39:37 2015-01-16 15:37:49
1 105.157.235.22 172.31.14.66 445 MS08067 (NetAPI) 4 2015-01-29 02:15:35 2015-01-29 01:25:55
2 105.186.127.152 172.31.14.66 445 MS04011 (LSASS) 1 2014-12-30 18:59:20 2014-12-30 18:59:20

Minimal Graph

To create a graph, we bind the columns attackerIP and victimIP to indicate the start/end of edges. The result is a graph of IPs connected by log entries.


In [4]:
g = graphistry.bind(source='attackerIP', destination='victimIP').edges(logs)
g.plot()


Out[4]:

Coloring edges by Vulnerabilities

We compute desired edge colors by creating a new column (ecolor) by assigning each vulnerability name to a different color code. We then tell the plotter to override the default edge coloring by binding our data to the attribute edge_color.

See the list of color codes at https://graphistry.github.io/docs/legacy/api/0.9.2/api.html#extendedpalette


In [5]:
vulnerabilityToColorCode = {vulnName: idx for idx, vulnName in enumerate(logs.vulnName.unique())}
vulnerabilityToColorCode


Out[5]:
{'DCOM Vulnerability': 7,
 'HTTP Vulnerability': 8,
 'IIS Vulnerability': 3,
 'MS04011 (LSASS)': 1,
 'MS08067 (NetAPI)': 0,
 'MYDOOM Vulnerability': 4,
 'MaxDB Vulnerability': 2,
 'SYMANTEC Vulnerability': 5,
 'TIVOLI Vulnerability': 6}

In [6]:
edges = logs.copy() # Copy the original data to avoid unintended modifications.
#Set an edge's color to the value in the vulnerability lookup table
edges['ecolor'] = edges.vulnName.map(lambda vulnName: vulnerabilityToColorCode[vulnName])
edges[:3]


Out[6]:
attackerIP victimIP victimPort vulnName count time(max) time(min) ecolor
0 1.235.32.141 172.31.14.66 139.0 MS08067 (NetAPI) 6 1.421434e+09 1.421423e+09 0
1 105.157.235.22 172.31.14.66 445.0 MS08067 (NetAPI) 4 1.422498e+09 1.422495e+09 0
2 105.186.127.152 172.31.14.66 445.0 MS04011 (LSASS) 1 1.419966e+09 1.419966e+09 1

In [7]:
# Finally, add the binding of ecolor to edge colors and plot
g2 = g.bind(edge_color='ecolor')
g.plot(edges)


Out[7]:

Controlling Node Attributes by Creating a Node Table

To set the size and colors of nodes we need to create a node table where each node is represented by a row.

  1. We gather a list of all nodes by concatenating the unique values of the source and destination columns of the edge table. This lists our node identifiers and will be the fist column of the node table.
  2. Then we add an additional column to the node table for each visual attribute such as color or size.
  3. Finally, we tell the plotter what to bind as the node identifier column and for any desired visual attributes.

We proceed in a few steps: collect all attacker IPs and color them red, collect all victim IPs and color them yellow, and then concatenate the IPs together into one table.


In [8]:
#Create the table of attackers. Our node identifier column will be called "IP".
attackers = edges.attackerIP.to_frame('IP')
attackers['type'] = 'attacker'
attackers['pcolor'] = 67006  #red
attackers[:3]


Out[8]:
IP type pcolor
0 1.235.32.141 attacker 67006
1 105.157.235.22 attacker 67006
2 105.186.127.152 attacker 67006

In [9]:
# Sames steps but for victims (destinations)
victims = edges.victimIP.to_frame('IP')
victims['type'] = 'victim'
victims['pcolor'] = 67001  #yellow
victims[:3]


Out[9]:
IP type pcolor
0 172.31.14.66 victim 67001
1 172.31.14.66 victim 67001
2 172.31.14.66 victim 67001

In [10]:
#Combine the two tables
#If an IP is both an attacker and a victim, prioritize coloring it as an attacker
nodes = pandas.concat([attackers, victims], ignore_index=True).drop_duplicates('IP')
nodes[:4]


Out[10]:
IP type pcolor
0 1.235.32.141 attacker 67006
1 105.157.235.22 attacker 67006
2 105.186.127.152 attacker 67006
3 105.227.98.90 attacker 67006

In [11]:
# We can now pass both the edge and node tables to "plot".
g2.bind(node='IP', point_color='pcolor').plot(edges, nodes)


Out[11]:

Exploring Graphs Interactively: Summarize, Filter, Drill Down, and Compare

Within the visualization, you can now filter and drill down into the graph.

For cool results, try to:

  • Open the histogram panel, and add histograms for victimPort, vulnName, and count. By selecting a region of a histogram or clicking on a bar, you can filter the graph. For instance, we see that though the NetApi vulnerability is the biggest bar and therefore the most common vulnerability. By clicking on its bar and filtering to only those, we see that is only present in the big cluster of attacks again IP 172.31.14.66. (Click again to remove the filter.)
  • With the histogram panel open, click on data brush and then lasso a selection on the graph. The histograms highlight the subset of nodes under the selection. You can drag the data brush selection to compare different subgraphs. For example, we see that the the attackers did not find many vulnerabilities in the smaller part of the honeypot.

Going Further

In the next part of the tutorial, we show

  1. Creating multiple graph views of the same data
  2. Aggregating multi-edges into bundles